# Large-scale Visual Representation
Vit So400m Patch16 Siglip 256.webli I18n
Apache-2.0
A vision Transformer model based on SigLIP, focusing on image feature extraction with original attention pooling mechanism.
Image Classification
Transformers

V
timm
15
0
Vit Large Patch14 Clip 224.datacompxl
Apache-2.0
A vision Transformer model based on the CLIP architecture, specifically designed for image feature extraction, released by the LAION organization.
Image Classification
Transformers

V
timm
14
0
Convnext Base.clip Laion2b Augreg
Apache-2.0
ConvNeXt Base image encoder based on the CLIP framework, trained on the LAION-2B dataset, supports image feature extraction
Image Classification
Transformers

C
timm
522
0
Convnext Base.clip Laion2b
Apache-2.0
CLIP image encoder based on ConvNeXt architecture, trained by LAION, suitable for multimodal vision-language tasks
Image Classification
Transformers

C
timm
297
0
Resnet50x64 Clip.openai
MIT
CLIP model based on the ResNet50x64 architecture from the OpenCLIP library, supporting zero-shot image classification tasks.
Image Classification
R
timm
622
0
Featured Recommended AI Models